[Reduce_then_scan refactor pt 2] Relaxing requirement subgroup size#2657
Open
danhoeflinger wants to merge 21 commits into
Open
Conversation
f46fcce to
191a9e3
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR continues the reduce_then_scan refactor by removing hard-coded compile-time sub-group sizes (e.g., 32/16) and expanding applicability of the reduce-then-scan pattern across more devices (including CPU), while attempting to preserve performance via runtime sub-group queries and adjusted work-group sizing.
Changes:
- Removes the device capability gating around
reduce_then_scanand switches several algorithms to always use it (with remaining gating only for limited-output cases). - Refactors sub-group scan building blocks to query sub-group sizing at runtime and updates downstream KT utilities to match the new API.
- Adjusts CPU work-group sizing caps and communication strategy (favoring SLM-based comms on CPU / non-trivially-copyable types).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl.h |
Removes gating/fallbacks so more scan/copy/set operations use reduce-then-scan by default. |
include/oneapi/dpl/pstl/hetero/dpcpp/parallel_backend_sycl_reduce_then_scan.h |
Removes compile-time sub-group size template params; adds runtime device sub-group size queries and new work-group caps. |
include/oneapi/dpl/experimental/kt/internal/sub_group/sub_group_scan.h |
Updates KT sub-group scan wrapper calls to the new reduce-then-scan scan primitive signatures. |
include/oneapi/dpl/experimental/kt/internal/cooperative_lookback.h |
Updates cooperative lookback’s use of sub-group scan primitives to match new templates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
191a9e3 to
1c2fcc1
Compare
5cd54c4 to
c32888d
Compare
7dc127d to
4a1560b
Compare
This reverts commit 4f46e97.
This reverts commit 0af5084.
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
da8f86a to
5b60051
Compare
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Relaxes the requirement of subgroup size 32 / 16 for reduce_then_scan (without sacrificing performance).
sycl::reqd_sub_group_sizethis can be treated in practice as a constexpr to enable optimizations anyway[[sycl::reqd_sub_group_size(...)]]with[[_ONEDPL_SYCL_REQD_SUB_GROUP_SIZE_IF_SUPPORTED(32)]]to allow the kernel to run on devices that don't support sub-group size 32Full picture: